Efficient access methods for very large distributed graph databases

نویسندگان

چکیده

• Indexing techniques are essential in large scale subgraph searching. Three new indexing proposed, which leverage the use of bitmaps. Generic framework for filter-then-verify implementations on top Apache Spark. Evaluation shows that different indexes suitable query selectivities. A distributed approach is very databases and low selective queries. Subgraph searching an problem graph databases, but it also challenging due to involved isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using index prune out graphs do not fit a filtering stage, reducing number evaluations subsequent verification stage. has be applied (tens millions graphs) real applications such as molecular substructure Previous surveys have identified FTV solutions GraphGrepSX (GGSX) CT-Index best ones (thousands graphs), however they cannot reach reasonable graphs). This paper proposes generic implementation solutions. Besides, three previous improve GGSX adapted executed clusters. The evaluation how achieved provide great improvement (between 70% 90% time reduction) centralized configuration may used achieve efficient over cluster configurations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 4 QUERY LANGUAGE AND ACCESS METHODS FOR GRAPH DATABASES

With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We present a graph query language (GraphQL) that supports bulk operations on graphs with arbitrary structures and annotated attributes. In this language, graphs are the basic unit of information and each query manipula...

متن کامل

Mining Very Large Databases

38 Computer E stablished companies have had decades to accumulate masses of data about their customers , suppliers, and products and services. The rapid pace of e-commerce means that Web startups can become huge enterprises in months, not years, amassing proportionately large databases as they grow. Data mining, also known as knowledge discovery in databases, 1 gives organizations the tools to ...

متن کامل

Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and u...

متن کامل

Quality of Very Large Databases

Analyses and data mining of large computer files are affected by the quality of the information in the files. For large population registers and for files that are created by merging two or more files, duplicate entries must be identified. Duplicate identification can depend on record linkage software that can deal with name, address, and date-of-birth data containing many typographical errors....

متن کامل

Very Large Databases: How Large, How Different?

Soon, the world. will need far more truly large databases then any of us ever imagined; yet, ironically, without a lot of care, VLDB’s,as we know them today may be left along the wayside. The way in which we think about, design and build enormous databases will have to completely change if we are to participate in this revolution. By now everybody, including database people, realizes that the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2021

ISSN: ['0020-0255', '1872-6291']

DOI: https://doi.org/10.1016/j.ins.2021.05.047